SLOPES OF KANTOROVICH POTENTIALS AND EXISTENCE 
OF OPTIMAL TRANSPORT MAPS IN 
METRIC MEASURE SPACES 



LUIGI AMBROSIO AND TAPIO RAJALA 

Abstract. We study optimal transportation with the quadratic cost function 
in geodesic metric spaces satisfying suitable non-branching assumptions. We 
introduce and study the notions of slope along curves and along geodesies and 
we apply the latter to prove suitable generalizations of Brenier's theorem of 
existence of optimal maps. 



1. Introduction 

The problem of finding an optimal way to transport mass has a long history, 
starting from Monge's seminal paper [14j. The optimality of a transport can be 
measured in many ways, depending on the choice of the cost function. In this 
paper we focus on the case when the cost is the square of the distance. 

Given two positive and finite measures /x and v on some metric space (X, (i) 
with the same total mass, which we may normalize to 1, our task is then to study 
whether the infimum 

inf / (f{x,T{x))dfi{x), (1.1) 
Jx 

over all possible /i-measurable maps T: X — t- X which send the measure fi to 
u, is attained. If such a minimizing map exists, we call it an optimal transport 
map between n and u. Existence of optimal maps or even of admissible ones is 
problematic, for instance no admissible map exists when /x is a Dirac mass and ly 
is not a Dirac mass. 

Kantorovich's relaxed ID] formulation of the optimal transport problem con- 
sists in finding the infimum 



inf / d'ix,y)7T{x,y) (1.2) 

J XxX 

over all possible transport plans, i.e. probability measures vr on X x X which have 
/i and u as marginals. Again, if there is a measure which attains the infimum, it 
is called an optimal transport plan between /x and u. Notice that transport plans 
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can split measure, and so they avoid the problem faced by transport maps. In 
fact, not only the Kantorovich formulation of the problem is well-posed, but the 
infimum is attained (possibly infinite) under the only assumption that (X, d) is 
complete and separable. Since we will be dealing with geodesic metric spaces, we 
will mostly work with the equivalent formulation in terms of geodesic transport 
plans, i.e. probability measures in the space Geo(X) of constant speed geodesies 
parameterized on [0, 1], with marginal conditions at t = and t = 1. 

In general, it seems to be a difficult problem to find necessary and sufficient 
conditions under which Monge's problem has a solution, but by now several suffi- 
cient conditions are known. For the quadratic cost in the Euclidean setting it was 
proved independently by Brenier [6j and Smith and Knott [T6] that there exists a 
unique optimal map T, given by the gradient of a convex function, provided that 
jj, is absolutely continuous with respect to the Lebesgue measure. This result was 
generalized to Riemannian manifolds by McCann [13], to Alexandrov spaces by 
Bertrand [5] to the Heisenberg group by Ambrosio and Rigot [1] and, very recently, 
to non-branching metric spaces with Ricci curvature bounded from below (in the 
sense of Lott, Sturm and Villani) by Gigli [BJ. Notice that in all these results a 
reference measure m (Lebesgue measure, Riemannian volume, Haar measure, etc.) 
plays a role, so the proper setting for this question is the family of metric measure 
spaces (X, d, m). 

In another recent paper pLj, a metric Brenier theorem is proved under mild 
assumptions on (X, d, m), see Theorem 10.3 and Remark 10.7 therein. In the case 
when (X, d) has bounded diameter and m is a finite measure, the main assumption 
is the existence of bounds on the relative entropy along geodesies (a condition 
weaker than the CD{K, oo) condition of Lott, Sturm and Villani) and the metric 
Brenier theorem states that, for any optimal geodesic plans vr, it holds 



(here if is any Kantorovich potential and iV'^'v^l is its ascending slope). In other 
words, the transportation distance depends /x-a.e. only on the initial point. 

This result raises some questions that we plan to investigate in this paper: 
the first one is to understand under which additional assumptions one can really 
recover an optimal map, the second one is about the differentiability of ip along 
geodesies used by the optimal plan. 

In connection with the first question we start from this heuristic idea more or less 
implicit in many proofs: under appropriate structural assumptions on the space, 
(11. 3p identifies the "initial velocity" of the geodesic. Indeed, assuming suitable 
non-branching assumptions on the space and on its tangent metric spaces we can 
perform a suitable blow-up analysis that leads to the existence of optimal maps. 
The proof of this result requires a detailed analysis of the proof of the metric 
Brenier theorem in [T] and the introduction of a sharper notion of ascending slope, 
namely the ascending slope |Vg v^l along geodesies. Since we believe that this 




vr-a.e. in Geo(X) 
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concept has an independent interest we compare this slope to the usual one and to 
the slope along curves, we provide an example and we raise some open problems. 
Coming back to the existence of optimal maps, our result covers as a particular case 
the Euclidean, the Riemannian and the Alexandrov case, see also the paragraph 
immediately after Theorem 14.31 for a more detailed discussion. 

In connection with the second question, it has already been proved in Theo- 
rem 10.4 of yj a "differentiability in mean", namely 

hm — r — = V^v^ (70) m L (tt). (1.4) 

40 rf(7o,7t) 

This weak differentiability property plays an important role in the subsequent 
paper [2], for the computation of the derivative of the entropy along geodesies. 
Here, under an additional doubling assumption on m, we are able to improve (II. 4p 
to a pointwise differentiability property, so that 

M = ¥^(70) - t|V+¥^P(7o) + o{t) 

for vr-a.e. 7 G Geo(X). 

Acknowledgement. The authors acknowledge the support of the ERG ADG 
GeMeThNES. The second author also acknowledges the support of the Academy 
of Finland, project no. 137528. 



2. Non-branching metric spaces 

Let us start by laying out the definitions for the metric spaces that will be 
used in this paper. First of all, we will be working exclusively in metric spaces 
{X, d) which are complete and separable. Second, by measure in (X, d) we mean 
a nonnegative Borel measure, finite on bounded sets. We will mainly consider 
metric spaces X equipped with a doubling measure m meaning that there exists a 
constant < C < 00 so that for all < r < diam(X) and x E X we have 

'm{B{x,2r)) < Cm{B{x,r)). 

A related notion for metric spaces where the measure has not been specified is 
that of a doubling metric space, which means that there exists an integer N > 1 so 
that, for all < r < 00, any ball of radius 2r can be covered by N balls of radius 
r. It is obvious that if there exists a doubling measure on X then the space X has 
to be doubling as well. The converse is also true for complete metric space, see for 
example [I2] and [TT] . 

We call any absolutely continuous map 7: [a,b] — )■ X a curve and use the ab- 
breviation 7s = 7(5). The length of the curve 7 is defined as 

/(7) = sup<^ ^rf(7i,,7i^_J : a<to<ti<--- <tM<b,N EN 
[ i=i 
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We call the curve 7: [a, 6] -> X a geodesic if /(7) = dipfa, Ih)- The metric space X 
itself is called geodesic if any two points x, y & X can be connected with a geodesic, 
i.e. there exists a geodesic 7 : [a, 6] — )■ X with 7a = a; and 7^ = Sometimes, when 
there is no danger of confusion, we also call the image of a geodesic a geodesic. 
The speed of a curve 7 is given by 

|7|(t)=lim^^ 

' ' s^t \s-t\ 

whenever the limit exists. It is not hard to prove, see for instance Theorem 1.1.2 
in [3], that it indeed exists at =Sf ^-almost every point t G [a,b], where is the 
Lebesgue measure on M, and that /(7) = J^^ \'y\{t)dt. 

We denote by Geo(X) the set of all constant speed geodesies in X which are 
parametrized by [0, 1], namely dipfsilt) = 1^ ~ -51^(70,71) for all s, t G [0, 1]. By a 
reparameterization argument, constant speed geodesies connecting any two given 
points exist in any geodesic space. We equip the space Geo(X) with the distance 

d*{l,l) = maxrf(7t,7t) 

t€[0,l] 

and note that {Geo{X),d*) is also complete and separable since the underlying 
metric space is. We will also use the convenient notation of evaluation map 
Ct : Geo(X) — !■ X, defined as 6^(7) = 7f for all t G [0, 1]. 

With the basic notation related to geodesies now fixed we are ready to introduce 
the two definitions of non-branching which play a crucial role in our results. 

Definition 2.1. We call a geodesic metric space (X, rf) non-branching if for any 
two constant speed geodesies 7, 7' : [0, 1] — )■ X with 70 = 79 and 7^ = 7^ for some 
s G (0, 1) we have 7* = 7^ for all t G [0, 1]. 

We would like to use non-branching on the level of the tangent spaces. However, 
two distinct geodesies of a metric space can collapse into a single geodesic of the 
tangent space in the blow-up. To control such collapsing we will assume a stronger 
version of non-branching. 

Definition 2.2. We call a geodesic metric space (X, d) strongly non-branching if 
for any two constant speed geodesies 7, 7': [0, 1] — > X with 70 = 79, 71 7^ 7i and 
di'yoyji) > 0, we have 

liminf|^>0. (2.1) 

40 c?(7o,7j) 



In our main theorem. Theorem 14.31 we assume that the space is strongly non- 
branching and that at almost every point we have some non-branching tangent 
space. Before defining what we mean by tangent space we recall the definitions 
of Hausdorff- and Gromov-Hausdorff-distance. The Haus dor ff- distance between 
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closed sets A, B C X is defined as 



dniA, B) = max < sup inf (i(a, 6), sup inf (i(a, h) 

Using the Hausdorff-distance, the Gromov-Hausdorjf-distance between two metric 
spaces (X, dx) and (Y, dy) is then defined as 

dGH{X,Y) = mf dH{f{X),g{Y)), 

where the infimum is taken over all metric spaces {Z, dz) and isometrics / : X — )■ Z, 
g: Y ^ Z. Finally, a sequence (X„, d^ Xn)'^=i of metric spaces (X„, dn) and points 
Xn £ Xn is said to converge to (X, rf, x) in the pointed Gromov-Hausdorff sense if 

lim dGH{Bx„{xn,r),Bx{x,r)) = Vr > 0, 

n— >oo 

where by B we denote the closed ball. Given a metric space (X, d) and a scaling 
factor r > 0, we define a rescaled metric rf,. on X by setting 

c^r(a;,i/) = ^d{x,y) 

for all X, y G X. 

Definition 2.3. Let (X, rf) be a metric space. We call a metric space (y, p) tangent 
to (X, d) at X G X if there exist a sequence (r„) J, and y E Y so that 



(X,d^„,x) — > iY,p,y) 

in the pointed Gromov-Hausdorff convergence. 

Notice that our definition of a tangent space is weaker than Gromov's original 
notion. He required the tangent space to be the full limit of the spaces (X, dr, x) as 
r ^ 0, whereas in our definition we only require convergence along a subsequence. 
As a consequence, with our definition the space (X, d) can in principle have a huge 
collection of different tangent spaces at a single point. 

We will use the following well-known result (see for instance p] Proposition 2.7]) 
which allows us to move from the Gromov-Hausdorff convergence to Hausdorff- 
convergence. 

Theorem 2.4. // (X„, dn) — ?■ (X, d) in the Gromov-Hausdorff' convergence then 
there exist a space {Z, dz) and isometric embeddings 

in'- {Xn,dn) ^ iZ,dz), i- {X,d) {Z,dz) 
so that in{Xn) — )■ i{X) in the Hausdorff convergence. 

In addition, if{Xn,dn) are equi-compact, then {Z,dz) can be taken to be a compact 
metric space. 
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3. Gradients along geodesics 

Let us recall some basic definitions in measure theory. The collection of univer- 
sally measurable sets of the space X, denoted by ^*{X), is the cr-algebra of the 
sets which are /i- measurable for all finite nonnegative Borel measure of {X,d). 
The collection of all Borel sets of (X, d) will be denoted by ^{X). 

Now we turn to our next set of definitions that concern metric differentials. 

Definition 3.1. Given a function /: X ^ R we define the lower ascending slope 
along geodesics of / at x e X as 

Vo / (x) = suplimmf r , 

where denotes the positive part and the supremum is taken over all nonconstant 
geodesics in X that start from the point x. 

Here ascending refers to the fact that we are taking the positive part of the 
difference quotient and lower refers to the fact that we are taking the liminf, 
rather than the limsup. 

Proposition 3.2. Suppose that f-.X^Wis continuous. Then |V^/| is univer- 
sally measurable. 

Proof. Let T e R and consider the sublevel set {|V+/| > T} G X. For the 
universal measurability it is sufficient to show that this set is Suslin. Because X is 
complete and separable, so are Geo(X) and X x Geo(X). Therefore {|V^/| > T}, 
being the projection of the set 

(x, 7) : 70 = X, liminf t-^^^^) ~ •^(^)]^ > t] (Z X x Geo(X) 
siO d{js, X) J 

to the space X, is indeed Suslin since the projected set can be written as 

teQn(o,i) s6Qn(o,t) J 
and so it is Borel (countable intersection of countable unions of open sets) . □ 

Definition 3.3. A function g: X ^ [0, oo] is an upper gradient along geodesics of 
a function / : X ^ R if for any 7 e Geo(X) we have 



1/(70) -/(7i) I < 9, (3.1; 
where the integral along 7 is understood as 

9 = Kl) / 9i7s)ds. 
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The almost everywhere differentiabihty of Lipschitz functions on the real line 
implies that ascending slopes are upper gradients for Lipschitz functions. We 
include the easy proof of this fact here for the convenience of the reader. Recall 
that a function / : X — )■ M is called Lipschitz if there exists a constant < L < cxd 
so that for any two points x, y & X we have 

\fix)-fiy)\<Ldix,y). 

Proposition 3.4. Let /: X — )■ M &e Lipschitz. Then the lower ascending slope 
along geodesies is an upper gradient along geodesies. 

Proof. By Proposition 13.21 the function |V^/| is universally measurable, and it 
is easily seen that this implies the ^^-measurability of |V^/| o 7 (just consider 
the push forward under =Sf^ of 7), see also [H Lemma 2.4]. Take 7 G Geo(X). 
The function / o 7: [0, 1] — )■ M is Lipschitz and therefore differentiable =Sf ^-almost 
everywhere. In particular, |(/o7)'(t)| < /(7)|V^/|(74) holds and both sides of the 
inequality are well defined at =Sf ^-almost every point t G [0, 1]. Thus 



1/(70) -/(7i) I 



ifoins)ds 



< / \{f o ins)\ds < l{^) / |V+/|(7.)rfs 
Jo Jo 
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□ 



It is interesting to compare the ascending slope and the upper gradient defined 
along geodesies to the more commonly used versions. First of all, it is immediate 
that we always have 

\Vp\{x)<\Vtf\{x)<\V^f\{x), (3.2) 
where the usual ascending slope |V^/|(x) of / at a point x is defined as 

V / ix) = lim sup r — — 

d{y,x) 

and the lower ascending slope along curves as 

[/(%) - f{xr 



|V^/|(x) = supliminf 



-y slO d{^s,x) 

with the supremum taken over all curves (recall that by convention all curves we 
consider are absolutely continuous). Moreover, the inequalities in f l3.2p can be 
strict. Notice also that the choice of the lower concept (i.e. with the liminf^) is 
justifed by the fact that the upper concept is easily seen to coincide with |V^/|). 

Recall that we have the usual notion of an upper gradient g: X — > [0, 00] of 
a function / : X — )■ M if we require the inequality (13. ip to hold along all curves 
on [0,1], where this time J^g is understood as g{'ys)\'ys\ds. It is not difficult 
to show, following the same proof given in Proposition 13. 2[ that ascending slopes 
along curves are universally measurable. Moreover, as in Proposition 13.41 one 
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can prove that ascending slopes along curves are upper gradients for Lipschitz 
functions. 

Ascending slopes along geodesies could be thought, identifying geodesies to the 
tangent space as in the theory of Alexandrov spaces, as directional one-sided 
derivatives. Hence, it is natural to ask if ascending slopes along geodesies are 
also upper gradients (in the usual sense) for Lipschitz functions. This is not true 
in general, as we will see in the next example. 

Example 3.5. There exist a separable complete geodesic metric space {X, d) and a 
Lipschitz function / : X — )■ R so that |V^/| is not an upper gradient of /. 

Let us first construct the metric space {X,d). We start the construction by 
taking a unit line-segment, which we simply denote by [0, 1]. Next for all n G N 
and < A; < 2" we connect the points /c2~" and {k + 1)2~" in [0, 1] with an arc 
An^k of length (2 — 2~")2~". In Figure [1] the arcs An^k are drawn as half-circles. 
The space X is then the disjoint union of the arcs An k and the initial line-segment 
[0,1]. 

We define the distance d between two points x,y & X as 



d{x,y) = inf y^/(-Ej 



where the infimum is taken over all collections of i^j's that connect the points x 
and y, Ei are subsets of the arcs and l{Ei) is the length of the piece determined 
by the length of the arc. This way on each arc A^^k the distance is given by the 
natural distance determined by the length of the arc. See the left part of Figure [T] 
for an illustration of the space. 

Let us check that (X, d) is geodesic. Let x, ?/ G X be two distinct points. If it 
happens that x and y lie on the same arc then the segment of the arc joining the 
points is our geodesic. We may then assume that the points are not on the same 
arc. We may also assume that x,y E [0, 1]. If this is not the case, for example 
X ^ [0, 1], we simply notice that any curve connecting the points x and y must go 
via one of the end-points x' and x" of the arc in which x lies in and that 

d{x, y) = mm{d{x, x) + d{x' , y), d{x, x") + y)}. 

We can now find the geodesic between the points x and y with the following 
procedure. Let (7*)^o a sequence of curves joining x to |/ so that limj_>oo ^(7*) = 
d{x,y). Because the lengths of the arcs are chosen so that the shortest curve 
between points A;2~" and {k + 1)2""' is the arc A^^k, there exists io ^ N so that 
each 7*, i > io, contains some An^k with 



n < 



log \x — y\ 
log 2 



+ 1 (3.3) 



and with some k. So, taking a subsequence of (7^) we may assume that all the 
curves contain the same arc An^k- Continuing inductively in the same way with 
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Figure 1. On the left is an illustration of the space {X,d). The 
lengths of the curves are chosen so that the geodesies between two 
points on the interval prefer going along the longest curves. On the 
right is the graph of the Lipschitz function / drawn along a part 
of the interval and along a couple of the construction curves joining 
the points of the interval. 



the end-points of this arc and the points x and y, and finally using a diagonal 
argument, we obtain the geodesic. 

We define the Lipschitz function / : X — )■ M first on [0, 1] by setting /|[o,i](a;) = x. 
This fixes the function on the end-points of all the arcs. We continue it inside the 
arcs by defining for all n G N and < k < 2^ 




A;2-" - 2-"+it, for t G [0, 1/2]; 

(A; - 3)2-" + 2-"+2t, for t G [1/2,1], 



where 7: [0, 1] X is the constant speed geodesic joining k2~^ to {k + 1)2"" in 
An^k- See the right part of Figure [1] for the graph of the function along a couple of 
the arcs and the line [0, 1]. 

Let us now show that |Vg = for all x G [0, 1]. To see this take a geodesic 
7 starting from a point x G [0, 1]. If 7 near the point x consists only of one piece 
of an arc the equality 

is immediate. Suppose then that for every e > there exists a point y G [0, 1] 
which is in the considered geodesic 7 and < d{x, y) < e. As we have noted before, 
the part of 7 that connects x to y must contain an arc A„ ^ with n bounded as in 
(13. 3 p and with some k. Moreover, we can take such an An,k that x > {k — 1)2^". 
Let z be the middle point of Ank- Now f{z) = [k — 1)2^" < /(x), and so indeed 

|v+/l(^) = o. 

On the other hand, the constant speed curve 7: [0, 1] — > [0, 1] has length 2 and 
is therefore an admissible test curve for the upper gradient property. 
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It is easy to see that the space X in our previous example is not doubhng and 
that it is extremely branching. In light of the example one could still hope many 
natural conjectures to be true. 

Question 3.6. Which assumptions are needed on the metric space {X, d) and on 
the measure m to ensure for any Lipschitz function / : X — )■ M that 

(i) the equality |V^/|(x) = |V^/|(x) holds at m-almost every point x G X? 

(ii) the function |V^/|(a;) is an upper gradient of /? 

Notice that if we assume that if m is a doubling measure on (X, d) then | V^/| = 
|V/| m-a.e. in X (see [H Proposition 2.5] for the simple proof of this fact) for any 
Lipschitz function /. 

In addition, if (X, d, m) supports a (1, l)-Poincare inequality, then we can apply 
Cheeger's theory to obtain |V/| < g m-a.e. in X for any (weak) upper gradient 
of /. Choosing g = |V^/| yields 

IV+/I = IV+/I = |V/| m-a.e. in X. 

For the same reason, under the same doubling and Poincare assumptions, a positive 
answer to question (ii) implies a positive answer to question (i): indeed, choosing 
g = |V^/| one obtains 

IV+/I = IV+/I = IV+/I = |V/| m-a.e. in X. 

4. Mass transportation in metric spaces 

Before stating and proving our main results we briefly discuss in the next subsec- 
tion the basic properties of transport plans and maps in the general metric space 
setting. A comprehensive treatment of the theory can be found for example in [3] 
and y/rj. 

4.1. Basic properties of transport plans. Let ^(X) denote the set of all 
Borel probability measures on X. The Wasserstein distance between two measures 
/X, G ^{X) is defined as 

W2ii^,iy) = (ini [ d\x,y)d-f{x,y)] , (4.1) 

where the infimum is taken over all transport plans 7 between /i and i>, i.e. mea- 
sures 7 G ,^(X X X) for which p^7 = and p|t7 = i^. Here the mappings and 
denote the projections to the first and second coordinate respectively. The nota- 
tion for a measure /i G i^(X) and a //-measurable mapping f : X Y means 
the push-forward measure defined as f^fi{A) = fi{f~^{A)) for all A G ^(Y). No- 
tice that in general 1^2 (/^; '^) might be infinite. We call a transport plan 70 between 
two measures /i, G ^(X), for which W2{fi,i') < 00, optimal if the infimum in 
(14. ip is attained at 7 = 70. 
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Since we are dealing with geodesic spaces, we can equivalently consider geodesic 
transport plans. We define the set of geodesic plans between /i and z/ as the set 
of all Tc G ^(Geo(X)) for which (eo)#vr = /i, (ei)#7r = z/. We say that a geodesic 
plan is optimal, and write tt G GeoOpt(/i, u), if 

rf2(7o,7i)rf7r(7) = W^{fi,u) < oo. 

Geo(X) 

Given an optimal geodesic plan vr G GeoOpt(/i, i/), it is clear that (eo,ei)#7r is 
an optimal plan. Conversely, making a measurable selection of constant speed 
geodesies 7^^ from x to y and considering the law of {x, y) 1— ?■ 7^^ under 7, any 
optimal plan can be "lifted" to an optimal geodesic plan with the same cost. 

The Kantorovich formulation of the transportation problem has also a very 
useful dual formulation: The minimum in (11. 2p is equal to 

2sup|y" ^{x)dfi{x) + j %lj{y)dv{y)^ , 

where the supremum is taken among all pairs G C°(X) x C°(X) satisfying 

i^{x)+i){y) < \d'^{x,y). 

We define the c-transform of a function ip: X — > M U {—00} as 

A function ip is called c-concave ii ip = ip'^ for some function (p. This terminology 
(c-transform, c-concavity) refers to a general cost function c. Here and in the 
sequel the cost function c is given by the halved square of the distance. 

Definition 4.1. Given an optimal geodesic plan vr G GeoOpt(/i, z/) we call a Borel 
function ip: X — )■ M U {—00} a Kantorovich potential (relative to the optimal 
geodesic plan vr) if it is c-concave and 

^ilo) + = ^-^^^f^ for 7r-a.e. 7 e Geo(X). 

Notice that because the Kantorovich potential <f is c-concave we have cp = {f'^Y 
and that we make no integrability assumption on ip and ip^. 
A set r G X X X is called cyclically monotone if 

n n 

^d'^{xi,yi) < ^d'^{xi,ya{i)) 

i=l i=l 

for any (xi, yi), . . . , (a;„, y„) G T and permutation cr of {1, ... , n}. 

Suppose that W2{fi,i^) < 00 and vr G GeoOpt(/i, z/). Then (eo,ei)#7r is sup- 
ported on a cyclically monotone set and a Kantorovich potential relative to tt 
exists. 
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4.2. Brenier theorem in metric spaces. As a starting point we prove the 
following result which originates from [Jj Theorem 10.3]. The difference compared 
to the original result is in the definition of the ascending slope, here replaced by 
the lower ascending slope. Also, we like to repeat the proof in our situation for 
the convenience of the reader. 

Proposition 4.2. Suppose that m is a finite measure on a bounded space {X, d) 
and fi = pm G ^{X), v G 0^(X\ Take vr G GeoOpt(/i, z/) and let y?: X — )■ 
M U {— c>o} he a Kantorovich potential relative to it. If there exists s G (0, 1) 
satisfying (es)#7r = p^m for all s G (0, s) and 



limsup / ps\og psdm < oo, (4.2) 

siO J X 

then 

|VgV5|(7o) = c?(7o,7i) forn-a.e. 7 G Geo(X). 
Proof. By the definition of the Kantorovich potential we have 

/ X C?^(70,7l) cf ^ /.ox 

<^(7o) = ^ <^ (71) (4.3) 

for vr-a.e. 7 G Geo(X). On the other hand, for any z E X we have 

^{z)<^^^-^^{^,). (4.4) 

Thus combining these two we get that for vr-a.e. 7 it holds 

u M^) - y(7o)]+ ^ r [c/^(^,7i)-^^(7o,7i)] + 
VJ<^ 7o < hmsup ^ < hmsup — ^ 

^^^0 d{z,'yo) ,^^0 2ct(z,7o) 

d^{z,-fo) + 2d{-fo,-fi)d{z,-fo) 

< hm sup — ^ = c/ 7o, 7i . 

2^70 ^d{z, 7o) 

Let us now prove the converse inequality in an integral form. Taking z = 7^ in 
fl4.4p and combining it with 04.31) we obtain 

/ N / N ^ c;^(7o,7i) ^^(7t,7i) 2t-t2 
Vho) - Vht) > = — — d (70, 7i)- (4.5) 

Because X is bounded, ip is Lipschitz and so by Proposition 13 . 41 the function | v^l 
is an upper gradient of ip along geodesies. Thus we have 

((/.(7o)-y^(7t))'< (^^ |V>|(7s)f^(7o,7i)^^s) <trf'(7o,7i)^ \\/^ip\\^s)ds. 
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Dividing this by (i^(7o,7f) and integrating over Geo(X) yields 

iff |V>|'(x)p,dm(x)rfs = i / / |V>P(7s)rf7r(7)ds 

>^ Jo Jx ^ Jo JGeo{X) 



> / ( "^,7 ) rfvr(7) > I ci^(7o,7i)rf^(7) 

JceoiX) V "I70,7j / 2 JceoiX) 

From the assumption (14.21) we know that 

/ gpsdm^ / gpdm as s J, for all (? G L^(X, m). 
Jx J X 

Because m is finite and | Vjj"v5| is bounded, this holds also for g = | V^v^p. Therefore 

I Vg = lim - / / \'V^(f\'^{x)psdm{x)ds 

*-l-0 t Jq Jx 



L 



'0 JX 

2 — t f 

-IT^^ / ^^(7o,7i)c?7r(7). 

m A Jgco{X) 



□ 



With the help of the Proposition 14.21 we are now able to prove a Brenier-type 
theorem in strongly non-branching metric spaces. 

Theorem 4.3. Assume that {X, d) is a strongly non-branching geodesic metric 
space equipped with a doubling measure m, and that p = pm G ^{X), v G S^iX^ 
satisfy W2{p,i^) < oo. Assume also that: 

(a) for m-almost every point x E X the space X has a non-branching tangent 

(b) there exists a transport plan n G GeoOpt(/i, u) such that for all s G [0, 1) 
sufficiently small we have (es)#7r -C m and the densities ps satisfy 



limsup / Ps log psdm < oo. 

siO Jx 

Then the optimal geodesic plan it is given by a mapping T: X — )■ X, i.e. 71 = T(7o) 
for n-a.e. 7 G Geo(X). 

Proof. Suppose that there is no such T. If we then fix a point xq G X and consider 
the restricted and rescaled measures 

TT, = -^^^Prr, where A{r) = {7 G Geo(X),7t G B{xo,r) for all t G [0, 1]}, 
7r(A(r)) 

we notice that for large enough r > the assumptions of the theorem are satisfied 
and still there exists no such T. Therefore we may assume the space X to be 
bounded. 
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Let be the Kantorovich potential relative to vr. Let x G X be a point where 
the space X has a non-branching tangent. Suppose that there are two geodesies 
7,76 Geo(X) so that 70 = 7o = 



and 



|V+^|(x) = rf(x,7i) = rf(x,7i) (4.6) 

Let (y, dy) be a non-branching geodesic metric space tangent to X at a; and 
r„ 4, a sequence so that 

(X,d,„,x)^(r,rfy,0) 

in the pointed Gromov-Hausdorff convergence as n — )■ 00. Since we assumed 
(X, d, m) to be doubling, the spaces (-B(x,c/r„)(^! 1)? are easily seen to be equi- 
compact. Indeed, for any e > we can find a maximal disjoint family of balls of 
radius r„e/2 contained in -B(x, r„), so that the family of balls with doubled radius 
covers -B(x, 1), and use the doubling inequality 

IJ'{Br{y))>c(^^^ Ii{Br{x)) whenever Br{y) C Bn{x) 

(here C > and a > depend on the doubling constant only) with r = r„e/2 and 
R = Tn to estimate the number of these balls with a constant depending only on 
C, a and e. We can then apply Theorem 12.41 to obtain a compact space {Z, dz) 
and isometric embeddings 

in- iB^x,drA^A),drJ -> iZ,dz), i: (5(y,rfy)(0, 1), dy) iZ,dz) 

so that in{B(^x,dr„)i^^ 1)) ~^ *(-B(y,dy)(0, 1)) in the Hausdorff convergence. 

From fl4.6p we get for every n G N a constant speed geodesic 7" with 7q = x 
and a radius i?„ > so that 

rf(7^,x) I 9^1^ ^ ^ 

for every s G (0, 1) for which (i(7",a;) < 

Now, possibly taking a subsequence of {rn)'^=i so that < Rn, we get a sequence 
of points {yn)'^=i C X with d{yn,x) = r„ and 

MM^4»^>|V>|W-i = d(...0-i (4.7) 

d{yn,x) ^ n n 

Notice also that for any ?/ G X we have 

</7(2/) < ^ <^ (71) = ^ + <^(x). (4.8) 
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Writing Zn = 7s for the s for which (i(7s,x) = r„, triangle inequahty and geodesic 
property yield 

d{yn, 7i) - d{yn, Zn) < d{zn, 7i) = d{x, 7i) - d{x, Zn). (4.9) 
Hence, using first (14. 9p . then (14 .Sp and eventually (14 .yp we have 

< d{yn, x) + d{x, Zn) - d{yn, Zn) < d{yn, x) + d{x, 7i) - d{yn, 7i) 



d{yn,x) + 



d{x,-fi) + d{yn,Ji) 



d{yn, x) + - 



< d{yn, x) - 



2 d{yn,x) d{x,^i) + d{yn,li) 

V{.yn)-V{.x) 2d{yn,x) 
d{yn,x) d{x, i) + d{y n,ji) 



< f 1 - ^'^^'^^'^i) n) \ d{yn,x) = o(r„). 
V d{x,-fi) + d{yn,-fi)J 

By taking a subsequence we find points y, z eY so that 
in{yn) i{y) and in{zn) -> 

In particular 

dy{y, Z) - dy{y, 0) - dy{Q, z) = liui ^r., ^n) - %n, x) - ^ ^ 

n-i>oo Tn 

and so lies on some constant speed geodesic rj 'm.Y joining y to z (obtained by 
the concatenation of the geodesies joining ?/ to and to z). 

With a similar argument we can show that lies on some constant speed geodesic 
f/ in y joining y to 5, where z is obtained as the limit in{zn) — >■ i^z) of the points 
Zn which are taken from the geodesic 7 so that d{zn, x) = r„. Note that we might 
have to go to yet another subsequence to achieve the convergence to z. 

Because the space X is strongly non-branching we have 

dy(;2,i)>liminf^^^^^>0 

and so the geodesies 1] and fj contradict the assumption that Y is non-branching. 

This means that our assumptions on the geodesies 7 and 7 can not be satisfied. 
Therefore there exists a set A C Geo(X) so that 7r(Geo(X) \ A) = and for every 
X G X there is at most one 7 G A with x = 79. Using the set A we can define the 
transport map T as 



T{x) 



when a; = 7o for some 7 G A 
otherwise. 



□ 
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Remark 4.4. One could prove Theorem 14.31 also under slightly different assump- 
tions. Namely by weakening the definition of strongly non-branching metric space 
by replacing the liminf in (12. ip to limsup, and then assuming that at almost every 
point all the tangent spaces to X are non-branching. This modified theorem is 
achieved by letting the sequence of radii in the blow-up be dictated by the weak- 
ened form of strong non-branching property, namely choosing r„ in such a way 
that d{x, Zn) = d{x, Zn) = Tn and lim„ d{zn, 5„)/r„ > 0. 

Theorem 14.31 applies, for example, when the space {X,d) is a finite dimensional 
Alexandrov space and m is the corresponding volume measure on X. The estimate 
f l4.2p on the relative entropy follows in this case from the result of Petrunin [12] 
which shows that in Alexandrov spaces the functional 



is concave along Wasserstein geodesies. Notice that a different proof for the Brenier 
theorem in Alexandrov spaces was already given by Bertrand in [5]. 

Brenier theorem has been recently established by Gigli [8] in non-branching 
spaces with Ricci-curvature bounded from below. This generalizes the previous 
result by Bertrand and it covers for example the case where the functional fl4.10p 
is concave along geodesies in the Wasserstein space (V{X), W2) of a non-branching 
space (X, d). Whereas our proof of Theorem 14. 31 is based on the behaviour of blow- 
ups and the Kantorovich potential, the proof by Gigli relies on the concavity of the 
functional and does not use the Kantorovich potential at all. Notice that because 
of the different techniques used in the proofs our geometric assumptions on the 
metric space X differ from those assumed by Gigli and hence the two theorems 
cover different collection of metric spaces. 

It is also important to notice that our Theorem 14.31 by no means covers all 
the cases where the Brenier theorem is known to hold. For example the Brenier 
theorem holds in the Heisenberg group but it is not difficult to see that the 
Heisenberg group is not strongly non-branching. 

We end this paper with an improvement of [H Theorem 10.4.] in the case 
where the reference measure m is doubling. In pQ it was shown that without the 
assumption that m is doubling we have the conclusion 



in the following theorem. Because our Proposition 14.21 was proved in the case 
where the space X is bounded, we will make the same boundedness assumption 
here. As in many of the results in [1] we could remove this assumption by requiring 
the density of the initial measure /x with respect to m to be uniformly bounded 
away from zero. 




(4.10) 



40 o?(7o,7t) 



in L2(Geo(X),7r) 
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Theorem 4.5. Let m be a doubling measure on a bounded metric space X and 
/i = pm G ^{X), with p > m-a.e. in X and v e Si^{X). Let n G GeoOpt(/i, u) 
and if: X — )■ M U {— c>o} be a Kantorovich potential relative to tt satisfying 

|V+<^|(7o) = c?(7o,7i) TT-a.e. mGeo{X). (4.11) 

Further assume that there exists s G (0, 1] such that for all s G [0, s) we have 
(es)#7r ^ m. Then for TT-a.e. 7 G Geo(X) we have 

r <^(7o) - Vilt) ^ 

l^f? — M ^ — = 7o,7i ■ 

40 a(7o,7t) 

Proof. If the lower ascending slope along geodesies of the Kantorovich potential 
were continuous, the theorem would follow immediately from the fact that the 
lower ascending slope is an upper gradient. This is not true in general, but what 
we can prove using density points and cyclical monotonicity is that for vr-almost 
every geodesic the lower ascending slope is continuous along the geodesic at its 
starting point. 

As we have seen in the proof of Proposition 14. 2[ in (14. 5p . the inequality 

. r ¥^(70) - V^ilt) ^ N 

limmf — ^ — > ci 7o,7i 

40 a(7o,7i) 

holds for TT-a.e. 7 G Geo(X). 

On the other hand, by Proposition 13.41 we know that |V^v^| is an upper gradient 
of ip along geodesies and so for all 7 G Geo(X) the estimate 



c^(7o,7t) c?(7o,70 



7l[0,t] 



holds for all t G (0, 1). So, our claim follows if we can show with any 5 > that 
for TT-a.e. 7 G Geo(X) 

|V3<^+|(7.) < (l + 5)rf(7o,7i) for^i-a.e. sG(0,t) (4.12) 

when t > (depending on 5 and 7) is small enough. 

Because p > 0, we know from (14.111) that for m-a.e. x E X there exists 7^ G 
Geo(X) with 7q = x and |V^(/3|(a;) = (i(7Q,7f). When we combine this with the 
assumption (es)#TT ^ m, for any s G (0, s), we get for TT-a.e. 7 G Geo(X) a curve 
7 G Geo(X) so that 7^ = 70 and 

|Vg</?|(7o) = c^(7o,7i)- 

Hence, by Fubini's theorem we know that for TT-a.e. 7 G Geo(X) a curve 7 G 
Geo(X) with the above properties exists for =Sf^-a.e. s G (0,s). 

Our task is now to estimate (i(7o,7i) from above. Because m is doubling, it is 
enough to prove this for 7 G Geo(X) for which 70 is a Lebesgue point of |V^V9|. 
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Figure 2. The three curves 7,7 and 7 which are used in the proof. 



Let e > and take t G (0, 1) so small that for every < s < t and x G -8(70, rg), 
where = 2s(i(7o,7i), there exists 7 G Geo(X) with 70 G B{x,ers) and 

t^(7o,7i) = |V,V|(7o) < (l + e)|V,V|(7o) = (1 + e)d(7o, 7i)- (4-13) 

Define q = S(i(7o, 7i)/(i(7o, 71) and let x = 7^ G 5(70, r^). With this choice of x 
let 7 G Geo(X) be as above. Notice that we may assume g < s, as otherwise the 
upper bound on (^(70, 71) immediately follows. The selected curves are illustrated 
in Figure [2l 

Now we are ready to estimate (i(7o,7i) from above. For this we use cyclical 
monotonicity: 

'^^(70, 7i) + '^^(70, 7i) < d'^ilo, 7i) + d'^ilo, 7i) 

< {d{lo, 7g) + d{%, 7i))2 + (rf(7o, %) + d{%, 70) + d{%, 71))^ 

= c?^(7o, %) + 2(i(7o, %)d{%, 71) + d'^{%, 71) + d'^{%, %) + 2d{%, %)d{%, 70) 

+ 2ci(7o, 7g)rf(7o, 7i) + ci^(7g, 70) + 2rf(7g, %)d{%, 71) + ^^(70, 71). 

Now, using the inequalities di^jo, 7^) < er^ and (i(7o, 7g) < we get that (i^(7o, 7i) + 
(i^(7o,7i) is bounded from above by 

eV^ + 2er,(l - g)ci(7o, 71) + (1 - g)'rf'(7o, 7i) + g'rf'(7o, 7i) 
+2qerJ{%, 71) + 2grf(7o, 7i)c^(7o, 7i) + + 2er,d{%, 71) + ^^(70, 71) 
= 2ers (er, + (i(7o, 71) + d{%, 71)) + ^^(70, 71) + ^^(70, 71) 

+2qd{%, 7i)rf(7o, 7i) + 2(g - l)qd^{%, 71) 
= 2er^ (er^ + 6/(70, 71) + ^(70, 71)) + d^{%, 71) + ^^(70, 71) 

+rsdi%, 7i) + (g - l)r,c/(7o, 71). 
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It follows that 2er, (er^ + 6/(70, 7i) + ^(7o, li)) + rsd{%, 71) + {q-l)rj{%, 71) > 0, 
so that dividing by and using fl4.13p yields 

. (l + 26)rf(7o,7i) + 2eV, ^ (1 + 26)(1 + e) + 4e^g 

a(7o,7i) < 7; < ] o "(70, 7i)- 

1 — q — 2e I — s — 2e 

Choosing s and e small enough, depending on 6, we achieve fl4. 12p and conclude 
the proof. □ 
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